Implement unwrap_unchecked using transmutes when niche-optimizations are in play #102151

the8472 · 2022-09-22T19:41:53Z

No description provided.

rust-highfive · 2022-09-22T19:41:57Z

(rust-highfive has picked a reviewer for you, use r? to override)

rustbot · 2022-09-22T19:41:57Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

the8472 · 2022-09-22T19:42:47Z

@bors try @rust-timer queue

rust-timer · 2022-09-22T19:42:48Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-09-22T19:42:56Z

⌛ Trying commit b852d54f74955dd93d08f60cb1d5f31fc17d1104 with merge a4bad50fb7be69e1d11952a70824ba9949d48ad7...

library/core/src/result.rs

the8472 · 2022-09-22T21:21:38Z

@bors try

bors · 2022-09-22T21:21:46Z

⌛ Trying commit b3ca318d62aa09f0e6ce1e234b4932a98d53d724 with merge 3c5bba5b33e8613c13885b42641ada47186c9186...

bors · 2022-09-22T22:56:35Z

☀️ Try build successful - checks-actions
Build commit: 3c5bba5b33e8613c13885b42641ada47186c9186 (3c5bba5b33e8613c13885b42641ada47186c9186)

bors · 2022-09-22T22:56:35Z

☀️ Try build successful - checks-actions
Build commit: 3c5bba5b33e8613c13885b42641ada47186c9186 (3c5bba5b33e8613c13885b42641ada47186c9186)

rust-timer · 2022-09-22T22:56:37Z

Queued 3c5bba5b33e8613c13885b42641ada47186c9186 with parent e7119a0, future comparison URL.

rust-timer · 2022-09-23T00:14:57Z

Finished benchmarking commit (3c5bba5b33e8613c13885b42641ada47186c9186): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	0.4%	[0.2%, 1.0%]	21
Regressions ❌ (secondary)	0.5%	[0.1%, 1.1%]	10
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.4%	[0.2%, 1.0%]	21

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	2.7%	[2.7%, 2.7%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.7%	[-1.7%, -1.7%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.5%	[-1.7%, 2.7%]	2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.5%	[-2.8%, -2.2%]	2
Improvements ✅ (secondary)	-1.8%	[-1.8%, -1.8%]	1
All ❌✅ (primary)	-2.5%	[-2.8%, -2.2%]	2

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

thomcc · 2022-09-23T01:27:41Z

Hm, very surprising that this would have any overhead. I kind of suspect it's an artifact of the compilation process, but who knows.

scottmcm · 2022-09-23T03:05:45Z

Maybe it's just that the MIR for it is already pretty good. Essentially it's just

if compute_discriminant(self) == 1 {
    _0 = move ((_1 as Some).0: T);
} else {
    call std::hint::unreachable()
}

That memcpy is roughly the same to LLCM as the transmute-copy, for a niched layout, and noticing that it doesn't need the condition at all and dropping unnecessary code is already pretty cheap for LLVM.

And rust ends up emitting all the discriminant calculation LLVM-IR anyway, since folding the if away doesn't happen in the polymorphic MIR, and thus cg_llvm will still need to output that code, which seems like it loses any advantage that it might have over from the transmute being easier for it to understand.

Maybe that could be improved by making it if const { mem::size_of::<T>() == mem::size_of::<Self>() } (thanks, #96557!) and reviving some of #91222 to be smarter about the resulting constant branches in codegen.

scottmcm · 2022-09-23T03:10:28Z

library/core/src/option.rs

+            // SAFETY: Size equality implies niches are involved. And with niches
+            // transmutes are ok because they don't change bits, only make use of invalid values
+            unsafe {
+                let val = mem::transmute_copy(&self);


YMMV: if you want to save the separate forget call, I think you can write this as

Suggested change

let val = mem::transmute_copy(&self);

return mem::transmute_copy(&ManuallyDrop::new(self));

(Since forget is just putting it in a ManuallyDrop and ignoring it these days anyway.)

…are in play

the8472 · 2022-09-23T17:58:19Z

@bors try @rust-timer queue

rust-timer · 2022-09-23T17:58:20Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-09-23T17:58:27Z

⌛ Trying commit 7157cfe with merge 24860e60db7c91164ed67469532d69c3ab700541...

the8472 · 2022-09-23T18:19:35Z

Maybe it's just that the MIR for it is already pretty good. Essentially it's just

But the LLVM-IR contains an assume which aiui is not without downsides. Transmuting avoids it entirely. https://rust.godbolt.org/z/4xorq6E5T

bors · 2022-09-23T19:19:20Z

☀️ Try build successful - checks-actions
Build commit: 24860e60db7c91164ed67469532d69c3ab700541 (24860e60db7c91164ed67469532d69c3ab700541)

rust-timer · 2022-09-23T19:19:22Z

Queued 24860e60db7c91164ed67469532d69c3ab700541 with parent 9a963e3, future comparison URL.

rust-timer · 2022-09-23T20:37:15Z

Finished benchmarking commit (24860e60db7c91164ed67469532d69c3ab700541): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	0.4%	[0.2%, 0.7%]	34
Regressions ❌ (secondary)	0.5%	[0.2%, 1.1%]	13
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.4%	[0.2%, 0.7%]	34

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	3.7%	[3.7%, 3.7%]	1
Regressions ❌ (secondary)	2.0%	[2.0%, 2.0%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	3.7%	[3.7%, 3.7%]	1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	3.2%	[3.2%, 3.2%]	1
Regressions ❌ (secondary)	2.6%	[2.6%, 2.6%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.9%	[-3.9%, -3.9%]	1
All ❌✅ (primary)	3.2%	[3.2%, 3.2%]	1

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

the8472 · 2022-09-23T20:58:41Z

compile-time losses are consistent with the previous run, but so are binary-size changes (especially in opt-full builds) and it is spending more time in LLVM, so it's having an effect on the optimizer, just not the one expected.

I'll take a look at the generated assembly maybe they're diffable.

the8472 · 2022-09-24T14:06:46Z

This code is somewhere in RawVec, the right side is this branch.

I think llvm makes use of that assume downstream of a next_unchecked's result being re-packaged into another Option.

rust-highfive assigned thomcc Sep 22, 2022

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Sep 22, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 22, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 22, 2022

juntyr reviewed Sep 22, 2022

View reviewed changes

library/core/src/result.rs Outdated Show resolved Hide resolved

the8472 force-pushed the unwrap-transmute branch 2 times, most recently from 013a7f9 to b3ca318 Compare September 22, 2022 20:13

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 23, 2022

scottmcm reviewed Sep 23, 2022

View reviewed changes

Implement unwrap_unchecked using transmutes when niche-optimizations …

7157cfe

…are in play

the8472 force-pushed the unwrap-transmute branch from b3ca318 to 7157cfe Compare September 23, 2022 17:54

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 23, 2022

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 23, 2022

the8472 closed this Sep 24, 2022

the8472 mentioned this pull request Mar 17, 2023

Permit the MIR inliner to inline diverging functions #106428

Merged

the8472 mentioned this pull request May 5, 2024

[Experiment] Replace unreachable_unchecked() with uninit().assume_init() in unwrap_unchecked() #124737

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement unwrap_unchecked using transmutes when niche-optimizations are in play #102151

Implement unwrap_unchecked using transmutes when niche-optimizations are in play #102151

the8472 commented Sep 22, 2022

rust-highfive commented Sep 22, 2022

rustbot commented Sep 22, 2022

the8472 commented Sep 22, 2022

rust-timer commented Sep 22, 2022

bors commented Sep 22, 2022

the8472 commented Sep 22, 2022

bors commented Sep 22, 2022

bors commented Sep 22, 2022

bors commented Sep 22, 2022

rust-timer commented Sep 22, 2022

rust-timer commented Sep 23, 2022

thomcc commented Sep 23, 2022

scottmcm commented Sep 23, 2022

scottmcm Sep 23, 2022

the8472 commented Sep 23, 2022

rust-timer commented Sep 23, 2022

bors commented Sep 23, 2022

the8472 commented Sep 23, 2022

bors commented Sep 23, 2022

rust-timer commented Sep 23, 2022

rust-timer commented Sep 23, 2022

the8472 commented Sep 23, 2022

the8472 commented Sep 24, 2022

	let val = mem::transmute_copy(&self);
	return mem::transmute_copy(&ManuallyDrop::new(self));

Implement unwrap_unchecked using transmutes when niche-optimizations are in play #102151

Implement unwrap_unchecked using transmutes when niche-optimizations are in play #102151

Conversation

the8472 commented Sep 22, 2022

rust-highfive commented Sep 22, 2022

rustbot commented Sep 22, 2022

the8472 commented Sep 22, 2022

rust-timer commented Sep 22, 2022

bors commented Sep 22, 2022

the8472 commented Sep 22, 2022

bors commented Sep 22, 2022

bors commented Sep 22, 2022

bors commented Sep 22, 2022

rust-timer commented Sep 22, 2022

rust-timer commented Sep 23, 2022

Overall result: ❌ regressions - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

thomcc commented Sep 23, 2022

scottmcm commented Sep 23, 2022

scottmcm Sep 23, 2022

Choose a reason for hiding this comment

the8472 commented Sep 23, 2022

rust-timer commented Sep 23, 2022

bors commented Sep 23, 2022

the8472 commented Sep 23, 2022

bors commented Sep 23, 2022

rust-timer commented Sep 23, 2022

rust-timer commented Sep 23, 2022

Overall result: ❌ regressions - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

the8472 commented Sep 23, 2022

the8472 commented Sep 24, 2022